-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert log level changes #3913
Revert log level changes #3913
Conversation
Closes: #3906
Did this cause any problem @bkchr? If not why wouldn't we want node operator understanding if their validators are properly connected or not I guess we kind of disagree on this and this #3677 the fact that this information is not relevant for nodes.
Let me try to see if I can change your mind :), at least on my perspective part of operating a system with minimal downtime is the ability to understand what happened in production after it happened, metrics are a part of the story for that, but we also need the default log level to give us enough information to be able to pin-point problems or at least know for sure certain things are going correctly. A lot of our bug reports include just the Import #<block_number> log, it is really hard to figure it out just from that what happened and for a certain categories of bugs you can't go and tell them to enable Hence, I has thinking we should actually, go the opposite direction and re-think, a bit the information our INFO level prints, so that we make it easier to understand what is going on. I'm aware that too much logging might cause other problems, but I don't think that was the issue with this line, because it is outputed every 10 minutes. |
I get what you want to achieve. However, printing all authorities that you are not connected are debug information. If you go ahead and just print the connectivity to all authorities in percentage every 10 minutes it is fine. Having a bad connectivity is almost never a result of what you are doing locally. If not having these issues because you regenerated your node key every restart and then authority discovery not telling the others your new key. However, this is a bug and nothing the operator can change. They could also not change anything for these missing authorities as they are not required to open slots in their firewall for each outgoing connection. Yes they could have configured their server in this way, but then they already failed there. All in all, printing all this information versus just printing the connectivity every 10 minutes as percentage doesn't really makes any difference. |
It doesn't make a difference for that particular node, but I does it for the whole network and anyone trying to find out which nodes are the ones not properly connected since they would show repeatedly in the logs for all the others. Anyways, this doesn't matter much because we figure it out the issue, the next one will probably be in other parts of the system, that's why I think that maybe we need to re-think our rules with what we output as |
Yes, that would be golden :D. |
And then? You see some ids and don't know who they are. Even if you resolve them to some validator name, it still doesn't solve any problem. You could just write a dedicated tool to query the DHT and then trying to connect to all validators. |
Closes: #3906